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Abstract — This paper is concerned with the construction of 
low-density parity-check (LDPC) codes with low error floors. 
Two main contributions are made. First, a new class of structured 
LDPC codes is introduced. The parity check matrices of these 
codes are arrays of permutation matrices which are obtained 
from Latin squares and form a finite field under some matrix 
operations. Second, a method to construct LDPC codes with low 
error floors on the binary symmetric channel (BSC) is presented. 
Codes are constructed so that their Tanner graphs are free of 
certain small trapping sets. These trapping sets are selected from 
the Trapping Set Ontology for the Gallager A/B decoder. They 
are selected based on their relative harmfulness for a given 
decoding algorithm. We evaluate the relative harmfulness of 
different trapping sets for the sum product algorithm (SPA) by 
using the topological relations among them and by analyzing the 
decoding failures on one trapping set in the presence or absence 
of other trapping sets. 

Index Terms — Trapping sets, structured low-density parity- 
check codes, algebraic construction, Latin squares. 



I. Introduction 

DESPITE the fact that numerous results on construction 
of LDPC codes [1] have been published in the past 
few years, this research topic remains contemporary in the 
field of coding theory. Researchers have focused on two main 
problems: (i) deriving new classes of structured codes and (ii) 
constructing codes with low error floor performance. 

To be efficiently encodable and decodable, the parity check 
matrix of an LDPC code must be structured (hence the term 
structured code). The construction of structured LDPC codes 
relies on algebraic or combinatorial objects. In many cases, 
the parity check matrix of a structured LDPC code can 
be represented as an array of permutation matrices. If the 
permutation mattices are circulant permutation matrices then 
the code is quasi-cylic (QC). Most researchers have focused on 
QC codes as these codes result in low encoding and decoding 
complexity. The encoding of these codes can be efficiently 
implemented using shift registers with linear complexity p), 
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while the decoding can be parallelized by exploiting the block 
structure of the parity check matrices |3), (4j. 

It is well-known that in order to achieve a reasonably good 
performance under iterative message passing decoding algo- 
rithms, the Tanner graph of an LDPC code must not contain 
cycles of length four. Numerous methods to form a parity 
check matrix such that its corresponding Tanner graph does 
not contain four cycles have been proposed. These methods 
ensure that any two rows (columns) of a parity check matrix 
have l's in at most one common position. This constraint on 
parity check mattices is referred to in |5J as the row-column 
(RC) constraint. 

Algebraic methods of constructing QC LDPC codes usually 
exploit a one to one correspondence between an element of 
an algebraic structure, such as a group or a Galois field, and a 
circulant. This one to one correspondence translates the prob- 
lem of constructing a parity check matrix into the problem of 
constructing a matrix of elements from the algebraic structure. 
The RC constraint is converted to a simpler consttaint on 
the second matrix. Notable work on algebraic constructions 
of LDPC codes includes (but is not limited to) [5]-|9| with 
methods in |5 1 and [9 1 being the most relevant to the structured 
codes proposed in this paper. 

Combinatorial constructions of LDPC codes evolved from 
balanced incomplete block designs (BIBDs) [10]. In these 
constructions, a parity check matrix is obtained from a point- 
block incidence matrix of a BIBD: points represent parity- 
check equations while blocks represent bits of a linear block 
code. The RC constraint is satisfied by setting the parameters 
of the BIBD so that no two blocks contain the same pair of 
points. The first class of combinatorially constructed LDPC 



codes was introduced by Kou, Lin and Fossorier in 1 1 1 1. These 
codes are closely related to finite-geometry codes, a well stud- 
ied class of codes which is used in conjunction with one-step 
or multiple step majority logic decoding. Other combinatorial 
methods of constructing LDPC codes were studied in great 
detail and summarized by Vasic and Milenkovic in ]T2) . 

In this paper, we give a new class of structured LDPC 
codes. The parity check matrices of these codes are arrays of 
permutation matrices which are obtained from Latin squares. 
These q x q matrices form a Galois field GF(q) under some 
matrix operations (introduced later in this paper). Hence, 
our codes are different from the codes proposed by Lan et 
al. pL which utilize a one to one correspondence between 
a (q — 1) x (q — 1) circulant permutation matrix and an 
element of the multiplicative group of GF(q). The new class 
of codes contains array LDPC codes [9] when q is a prime, 
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but includes higher rate codes than shortened array LDPC 
codes (13) , p4| , when the Tanner graphs are required to 
satisfy certain constraints. The description of the new class 
is not only concise and general but also makes the RC 
constraint trivial to satisfy. Above all, our permutation matrices 
are more general than circulants as the circulant property 
for our codes holds on indices understood as elements of 
GF(q). More specifically, the permutation matrix correspond- 
ing to a* 6 GF(g) sends the indices (0, 1, a, ... , a q ~ 2 ) to 



(0- 



v<?-2- 



■a ). This new class of codes 



serves as a basis for a method of constructing codes with low 
error floor performance, which we shall now explain. 

By now, it is well established that the error floor phe- 
nomenon, an abrupt degradation in the error rate performance 
of LDPC codes in the high signal-to-noise-ratio (SNR) region, 
is due to the presence of certain structures in the Tanner graph 
that lead to decoder failures fl5) . For iterative decoding, these 
structures are known as trapping sets (see JT6] for a list of 
references). 

To construct LDPC codes with provably low error floors, 
it is essential to understand the failure mechanism of the de- 
coders in the high SNR region as well as to fully characterize 
trapping sets. These prerequisites had been met for decoders 
on the binary erasure channel (BEC), in which case trapping 



sets are known under the notion of stopping sets 1 17 1. For the 
BEC, the definition of stopping sets is fully combinatorial and 
the code construction strategy is simply to maximize the size 
of the smallest stopping set. Such a level of understanding has 
not been gained for other channels of interest. 

On other channels, such as the BSC or the additive white 
Gaussian noise channel (AWGNC), knowledge on trapping 
sets is far from complete due to the complex nature of iterative 
decoding algorithms, such as the SPA. As a result, code 
performance is typically improved by increasing the girth 
of the Tanner graph [14|, JTS), (19). The basis for these 



approaches is mostly constituted in two facts. First, a linear 
increase in the girth results in an exponential increase of the 
minimum distance if the code has column weight d v > 3 (20) . 
Second, trapping sets containing shortest cycles in the Tanner 
graph are eliminated when the girth is increased. In addition, 
several recent results can be used to justify the construction of 
a code with a large girth: the error correction capability under 
the bit flipping algorithms was shown to grow exponentially 
with the girth for codes with column weight d v > 5 ]2T) ; 
the minimum pseudo-codeword weight on the BSC for linear 
program decoding was also shown to increase exponentially 
with the girth [22]. It is worth noting here that the minimum 
stopping set size also grows exponentially with the girth for 
codes with column weight d v > 3 J23J. 

Nevertheless, for finite length codes, large girth comes with 
large penalty in code rate. In most cases, at a desirable code 
rate, the girth can not be made large enough for the Tanner 
graph to be free of the most harmful trapping sets that mainly 
contribute to decoding failures in the error floor region. These 
trapping sets dictate the size of the smallest error patterns 
uncorrectable by the decoder and hence also dictate the slope 
of the frame error rate (FER) curve (T6) . To preserve the 
rate while lowering error floor, a code must be optimized not 



by simply increasing the girth but rather by more surgically 
avoiding the most harmful trapping sets. 

In this paper, LDPC codes are constructed so that they are 
free of small harmful trapping sets. We focus our attention 
on regular column-weight-three codes as these codes allow 
very low decoding complexity but exhibit very high error 
floor if they are not designed properly. A key element in the 
construction of a code free of trapping sets is the choice of 
forbidden subgraphs in the Tanner graph, since this choice 
greatly affects the error performance as well as the code rate. 
This choice is well determined if the Gallager A/B algorithm is 
used on the BSC since the necessary and sufficient conditions 
for a code to guarantee the correction of a given number of 
errors are known (24), (25). However, for the SPA on the BSC 
and on the AWGNC, the choice of forbidden subgraphs is not 
clear due to the lack of a combinatorial characterization of 
trapping sets for these channels. In a series of papers |26|- 
(28) we used the notion of instantons to predict the error floors 
as well study the phenomenon from a statistical mechanics 
perspective. In (29) we showed how the family of instanton 
based techniques can be used to estimate and reduce error 
floors for different decoders operating on a variety of channels. 
Unfortunately, instanton search is computationally prohibitive 
for construction of moderate length codes, and in this paper 
we propose another, simpler, method. 

In the absence of a complete understanding of trapping 
sets for the SPA, the choice of forbidden subgraphs may 
be derived based on the understanding of trapping sets for 
simpler decoding algorithms as well as on intuition gained 
from experimental results. This is the approach we take in 
this paper. A basis for removing harmful trapping sets for the 
SPA is the observation by Chilappagari et al. (29[ that the de- 
coding failures for various decoding algorithms and channels 
are closely related and that subgraphs responsible for these 
failures share some common underlying topological structures. 
These structures are either trapping sets for iterative decoding 
algorithms on the BSC or larger subgraphs containing these 
trapping sets. 

The method consists of three main steps. First, we develop 
a database of trapping sets for the Gallager A/B algorithm 
on the BSC. This database, which is called the Trapping Set 
Ontology (TSOQ contains subgraphs that are responsible for 
failures of the Gallager A/B decoder and also specifies the 
topological relations among them. Second, based on the TSO, 
we determine the relative harmfulness of different subgraphs 
for the SPA on the BSC by analyzing failures of the decoder on 
one subgraph in the presence or absence of other topologically 
related subgraphs. This analysis is performed repeatedly on 
a number of "test" Tanner graphs, which are intentionally 
constructed to either contain or be free of specific subgraphs. 
The relative harmfulness of a subgraph is evaluated based on 
its effect on the guaranteed correction capability of a code. 
Finally, a code is constructed so that its Tanner graph is free 
of the most harmful subgraphs. 

It can be seen that our construction attempts to optimize a 



'This database of trapping sets was partially presented in |30| and is 
available online at 1311 
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code for the SPA on the BSC. Due to much higher complexity, 
similar analysis on the AWGNC is difficult. However, exper- 
imental results show that codes constructed for the BSC also 
perform very well on the AWGNC. It should be noted that in 
p2) , extensive computer simulation and hardware emulation 
suggest that absorbing sets mainly contribute to error floors of 
codes under the SPA on the AWGNC. Since absorbing sets are 
combinatorially similar to trapping sets for the Gallager A/B 
decoder, our newly constructed codes are also free of some 
(and probably the most harmful) absorbing sets and hence un- 
derstandably possess good error performance on the AWGNC. 
Although absorbing sets were invented in research that dealt 
with the AWGNC, their unproven harmfulness prohibit an 
explicit strategy to construct codes for the AWGNC. As a 
result, optimizing codes for the BSC in order to obtain good 
performance on the AWGNC remains a reasonable approach. 

The rest of the paper is organized as follows. In Section 
|ll] we provide background related to LDPC codes and the 
necessary preliminaries for the description of the new codes. 



In Section III we propose a new class of codes based on Latin 
squares obtained from the additive group of a Galois field. 
Relations of the new codes with existing codes in the literature 
can be found discussed in Appendices [B] and |C| We continue 
with the presentation of our Trapping Set Ontology for the 



Gallager A/B decoder in Section IV Analytical construction 
of a code free of trapping sets is difficult and hence we resort 
to an efficient search of the Tanner graph for certain subgraphs. 
We briefly discuss these search techniques in Section [VJ with 
more details are given in Appendix [A] In Section [VI] we 
describe in general the construction of a code free of certain 
trapping sets. We present the constructions of codes for the 
Gallager A/B algorithm and the SPA on the BSC in Sections 



VII and VIII In Section IX we show the performance of 



several codes on the AWGNC and then conclude the paper. 

II. Preliminaries 

In this section, we introduce the definitions and notation 
used throughout the paper. 

A. LDPC Codes 

Let C denote an (n, k) LDPC code over the binary field 
GF(2). C is defined by the null space of H, an m x n parity 
check matrix of C. H is the bi-adjacency matrix of G, a 
Tanner graph representation of C. G is a bipartite graph with 
two sets of nodes: n variable (bit) nodes V — {1,2, ... ,n} 
and m check nodes C = {1, 2, . . . , m}. A vector y = 
(yij 2/2, • ■ • , Un) is a codeword if and only if yH T = 0, 
where H T is the transpose of H. The support of y, denoted 
as supp(y), is defined as the set of all variable nodes (bits) 
v E V such that y v ^ 0. A d v -left-regular LDPC code has a 
Tanner graph G in which all variable nodes have degree d v . 
Similarly, a <i c -right-regular LDPC code has a Tanner graph 
G in which all check nodes have degree d c . A (d v , d c ) regular 
LDPC code is d^-left-regular and d c -right-regular. Such a code 
has rate R > 1 — d v /d c The degree of a variable node 
(check node, resp.) is also referred to as the left degree (right 
degree, resp.) or the column weight (row weight, resp.). The 



length of the shortest cycle in the Tanner graph G is called 
the girth g of G. 



B. Permutation Matrices from Latin Squares 

A permutation matrix is a square binary matrix that has 
exactly one entry 1 in each row and each column and 0's 
elsewhere. Our codes make use of permutation matrices that do 
not have l's in common positions. These sets of permutation 
matrices can be obtained conveniently from Latin squares. 

A Latin square of size q (or order q) is a q x q array in 
which each cell contains a single symbol from a q-set S, such 
that each symbol occurs exactly once in each row and exactly 
once in each column. A Latin square of size q is equivalent 
to the Cay ley table of a quasigroup Q on q elements (see [33 
pp. 135-152] for details). 

For mathematical convenience, we use elements of Q to 
index the rows and columns of Latin squares and permutation 
matrices. Let C = [^,j] ijG q denote a Latin square defined 
on the Cay ley table of a quasigroup (Q, ©) of order q. We 
define /, an injective map from Q to Mat(g, q, GF(2)), where 
Mat(q, q, GF(2)) is the set of matrices of size qxq over GF(2), 
as follows: 



such that: 



f:Q -> Mat(g,g,GF(2)) 
a i ^ f(a) = [mi,j] itje Q 



1 if li.j = a 
if h,j ^ a 



According to this definition, a permutation matrix corre- 
sponding to the element a E Q is obtained by replacing the 
entries of C which are equal to a by 1 and all other entries of 
C by 0. It follows from the above definition that the images of 
elements of Q under / give a set of q permutation matrices that 
do not have l's in common positions. This definition naturally 
associates a permutation matrix to an element a E Q and 
simplifies the derivation of parity check matrices that satisfy 
the RC constraint, as will be demonstrated in the next section. 

Example 1: Let Q be a quasigroup of order 4 with the 
following Cayley table: 



© 





1 


2 


3 








1 


2 


3 


1 


1 





3 


2 


2 


2 


3 


1 





3 


3 


2 





1 



The Latin square obtained from the Cayley table of Q is: 



C = 
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The injective map / sends elements of Q to four permuta- 
tion matrices: 



/(G) 



/(2) 



10 
10 
1 
10 

10 
1 
10 
10 



,/(3) 



10 

10 
10 

1 

1 
1 

10 

10 







C. LDPC Codes as Arrays of Permutation Matrices 

The definition of an LDPC code whose parity check matrix 
is an array of permutation matrices is now straightforward. Let 

W = [ w i,j]i<i<u i<j<n be an /i x 77 matrix over a quasigroup 
Q, i.e., 



W = 



W\,2 
W 2 ,2 



^,2 



(1) 



With some abuse of notation, let H = f(W) = [f(wi } j)] 
be an array of permutation matrices, obtained by replacing 
elements of W with their images under /, i.e., 



f{w2,l) 



f(wi,2) 
f(w2,2) 



f(wi,r,) 

f( W 2, n ) 



(2) 



Then H is a binary matrix of size \iq x rjq. The null space of 
H gives an LDPC code C of length -qq. The column weight 
and row weight of C are d v = /i and d c = r\, respectively. 

We remark that different permutations of rows and columns 
of the Latin square C result in different sets of permutation 
matrices. These sets of permutation matrices result in different 
permutations of H in (j2j). Since permuting rows and columns 
of W only leads to the relabeling of the variable nodes and 
check nodes of the corresponding Tanner graph, different 
permutations of rows and columns of the Latin square C result 
in the same code. Therefore, a code is completely specified 
by a quasigroup Q along with a matrix over Q. 

III. Structured LDPC Codes from Galois Fields of 
Permutation Matrices 

The codes in this section are obtained when Q is the additive 
group of a Galois field. When Q is the multiplicative group 
of a Galois field, the codes proposed in (5} are obtained. We 
discuss this class of codes in Appendix |C| The codes in this 
section also contain array LDPC codes 19} when the Galois 
field is a prime field, as shown in Appendix [B] 

A. Galois Fields of Permutation Matrices 

Consider a Galois field GF(q), where q = p , -d £ Z 
and p is prime. Let a be a primitive element of GF(q). The 



elements of GF(q) and cfl 1 = 1. Let C = [hj^ e g denote 
a Latin square defined by the Cayley table of (Q,®) where 



Q = {o,i, 



a, . 



x q 2 } and ffi is the subtractive operation of 



GF(q), i.e., lij = i — j. Although the rows and columns of C 
can be indexed arbitrarily, for simplicity we assume that they 
are indexed from top to bottom and left to right with increasing 
powers of a. Let M. — {M_oo, Mo, Mi, . . . , M q -i \ be the set 
of images of elements of Q under /, i.e., M t = [m 



(t)i 



/(a*). It is easy to see that Af-oo = /, the qxq identity matrix. 
To show that A4 forms a field isomorphic to GF(g) under 
the matrix operations defined below, we give the following 
propositions. 

Proposition 1: For all t\,t>} € Z, /(a* 1 
Proof: Let S = [^AideQ = M tl M t2 



+ a t2 ) = M tl M t2 . 



- a 
then 



,(*2) 



Since M tl and M t2 are permutation matrices, 2 is a permu- 
tation matrix. Assume that & j = 1. Then there exists r £ Q 



such that m 

and r — j = 

E = M tl M t2 



(*i 



= m r,f = L 
j:* 2 . Adding, i 

/(a* 1 +a* 2 ). 



This indicates that i 



a 



J 



Corollary 1: M t p = I, Vt. 
Proposition 2: For all t > 0, M t+1 
a q x q permutation matrix given as 

1 ••• 

••• 

1 ••• 

1 ••• 



P 



or 1 + a* 2 and hence 
■ 

= PM t Q, where P is 


1 


. (3) 



t ••■ 1 

and Q = P T , the transpose of P. 

Proof: Consider two matrices M t — /(a*) and M t +i 
f(a t+1 ) for some t > 0. Assume that mfj 
■ a*. Consequently, l a i, a j — 
= 1. Therefore 



i - J 
(t+i 



= 1, then ijj 

iv' and 



r ; / .1 \ J 1 M ' j ) 

,„,„, ■■■ — we can obtain M t +i from M t by 

performing the following two operations: 

• Cyclic permutation of the last q — 1 rows of M t , and 

• Cyclic permutation of the last q — 1 columns of the 
resulting matrix. 

It is now clear that M t+ i = PM t Q. ■ 
Define the addition EB and the multiplication □ on A4 as: 



fflM t2 
BM t2 



powers of a, a 



0, a — 1, a, a 2 



\ q 2 , give all q 



: M tl M t2 , 

P t2 M tl Q t2 

= P^MtzQ* 1 

then A4 together with ffl and □ form a field isomorphic to 
GF(q). 

Remark: Assume that the rows and columns of L are in- 
dexed arbitrarily. Let (a* 1 , a 12 , . . . , a lq ) be indices of the rows 
of C from top to bottom and let (a J1 , a- 72 , . . . , a? q ) be indices 
of the columns of C from left to right. Proposition [2] holds 
if P and Q are chosen so that the indices of the rows (from 
top to bottom) and the columns (from left to right) of PCQ 
are (a ll+1 , a 42+1 , . . . , a^ +1 ) and (ot> 1+1 , a j2+1 , . . . , a j " +1 ), 
respectively. 
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B. LDPC Codes from Galois Fields of Permutation Matrices 
Define W and H as in ([!]) and where (Q, ©) is the set 



{0,1, , 



V?- 2 



} together with the subtractive operation of 



GF(q). The following theorem gives a necessary and sufficient 
condition on W, such that the Tanner graph corresponding to 
H has girth at least 6. 

Theorem 1 ( Cross-addition Constraint): The Tanner graph 
corresponding to H contains no cycle of length four iff j x + 
w i2,j2 Wi 1 ,j 2 +w i2 j 1 for any 1 < < 1 < 31,32 < "q\ 

h ^ fa\ jl ^ 32- 

Proof The Tanner graph corresponding to % contains at 
least one cycle of length four if and only if there exist two rows 
of H that have l's in at least two common positions. Treat % 
as a matrix over R and let 3 = %%' . Then 3 is a matrix over 
R. W contains two rows that have 1 's in at least two common 
positions if and only if 5 contains at least one non-diagonal 
component vj > 1. Since T-L is an array of matrices, 3 is also 
an array of matrices. Also, /(a*) is a permutation matrix, so 
its transpose is its inverse and is f(—a v ) (by Proposition [TJ. 
Therefore, 3 = [fafy]^^^ where 



v 



r=l 



f{w lu r)f(-Wi 2 . r ). 

) € AL (,i Xl i 2 contains an element 



Since f{w il>r )f(- 
zu > 1 if and only if there exist j% ^ 32 such that 

O w 



J/( 




= f{wn 


,32) f{ W l2,j2 






= W h,32 


~ W l2 ,J2 


'iij'i 


W i2 ,32 


= w ii,j 2 


+ Wi 2 ,h 



It can be seen that the construction of an LDPC code with 
girth at least 6 from a Galois field of permutation matrices 
reduces to finding a matrix W that satisfies the cross-addition 
constraint. 

Example 2: It can be noticed that a Latin square obtained 
from the Cayley table of the multiplicative group of GF{q) 
satisfies the cross-addition constraint. The cross-addition con- 
straint is still satisfied if a row and a column of all zero are 
appended to such a Latin square. Therefore, one form of W 
that satisfies the cross-addition constraint is given by 

••• 
1 a ■■■ a q ~ 2 
a a 2 ■■■ 1 



W 



v9" 2 



,1-3 



(4) 



Let T-L = /(W). From Proposition |2j it follows that T-L has 
the following structure: 





' I 


I 


/ ■ 


/ 






I 


M 


Mi • 


• M 9 _ 2 




H = 


I 


Mi 


M 2 • 


• M 


(5) 




I 


Mg_2 


M • 


• M 9 _ 3 _ 





matrix over GF(q) with both row and column weights q. Since 
W satisfies the cross-addition constraint, the Tanner graph 
corresponding to % contains no cycle of length 4. 

For any pair (7,p) of positive integers with 1 < 7,p < q, 
let H be a 7 x p subarray of T-L. Then H is a 79 x pq matrix 
over GF(2) which is also free of cycles of length 4. H has 
constant column weight d v — 7 and row weight d c — p. The 
null space of H gives a regular structured LDPC code C of 
length pq. It can be shown that the rank of H is 97 — 7 + 1, 
and hence C has rate R 



g-7 
q 



7-1 

9 2 



C. Remarks 

For any parity check matrix H' which is an array of 
permutation matrices, we can permute the rows and columns 
to obtain % such that the topmost and leftmost permutation 
matrices of % are identity matrices. The matrix H is the image 
of a matrix W under /, where entries on the first row and 
first column of W are € GF(<?). Therefore, in the rest of 
the paper, we only consider matrices W of which elements on 
the first row and on the first column are zeros. For simplicity, 
we denote U as the submatrix of W such that 



W = 





u 



(6) 



and then write U = f(W) = f(U). 

It can be seen that the notion of Latin squares provides a 
general and elegant description for a wide variety of struc- 
tured LDPC codes whose parity check matrices are arrays of 
permutation matrices. For the codes described in this section, 
the permutation matrices are more general than circulant 
permutation matrices as the circulant property for our codes 
holds on indices understood as elements of GF(q). Specifically, 
the permutation matrix corresponding to a* sends the indices 



(0,1, a, 



) to (0 + a% l + a\a + < 



,9-2. 



In Appendix [B] we show that the class of codes described 
in this section includes array LDPC codes (9J. In particular, let 
g be a prime then an array LDPC code is a subarray i? arr of 
the binary matrix "Hair that is obtained by permuting rows and 
columns of H in |5]) in a certain way. Note that similar to H in 
"5), %arr is also a q x q array of permutation matrices. In 1 13 1, 
14| , a method is given to construct a shortened array LDPC 
code of large girth by selecting certain blocks of columns of 
-ff arl to form the parity check matrix. Assume that iJ ar r is 
such a parity check matrix then H^l is a subarray of ff arr 
and is also a subarray of H alI . This approach utilizes the fact 
that the Tanner graph representation of W alI is free of four 
cycles and hence the Tanner graph representation of H^l is 
also free of four cycles. However, starting from on a predefined 
matrix H alI is not a good solution in terms of code rate. This 
is because the fact that H^A- is a subarray of can also be 

(s) 

understood as a constraint on H^i and therefore one might 
expect this constraint to reduce the code rate. The description 
of the codes proposed in this section along with the cross- 
addition constraint allow the construction of a parity check 
matrix in which the above-mentioned constraint is eliminated. 



where M t — P f MoQ t and / is the gx q identity matrix. % This method of construction will be presented in Section VI-A 
is an array of permutation matrices from A4 and is a q 2 x q 2 Since the constraint is eliminated, the construction usually 
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results in codes with higher rates than these of shortened array 
codes. In this paper, we use this construction method TSO 
(presented in the next section) to obtain codes with low error 
floors. 

IV. Trapping Set Ontology 

In this section, we describe our database of trapping sets 
known as the Trapping Set Ontology. This database will be 
used as a guideline for the construction of codes free of small 
trapping sets to be presented in subsequent sections. We start 
with a brief discussion of trapping sets and related objects. 

A. Trapping Sets 

A trapping set for an iterative decoding algorithm is defined 
as a non-empty set of variable nodes in a Tanner graph G that 
are not eventually corrected by the decoder fl5) . A set of 
variable nodes T is called an (a, b) trapping set if it contains 
a variable nodes and the subgraph induced by these variable 
nodes has b odd degree check nodes. 

For transmission over the BEC, trapping sets are charac- 
terized combinatorially and are known as stopping sets fT7) . 
For transmission over the AWGNC, no explicit combinatorial 
characterization of trapping sets has been found. In the case 
of the BSC, when decoding with the Gallager A/B algorithm, 
or the bit flipping (serial or parallel) algorithms, then trapping 
sets are partially characterized under the notion of fixed sets. 
By partially, we mean that these combinatorial objects form a 
subclass of trapping sets, but not all trapping sets are fixed 
sets. Fixed sets have been studied extensively in a series 
of conference papers fl6) , pT| , J24) , [34) . They have been 
proven to be the cause of error floor in the decoding of 
LDPC codes under the Gallager A/B algorithm and the bit 
flipping algorithms. For the sake of completeness, we give the 
definition of a fixed set as well as the necessary and sufficient 
conditions for a set of variable nodes to form a fixed set. 

Consider an iterative decoder on the BSC. Assume the 
transmission of an all-zero codeworcQ With this assumption, 
a variable node is correct if it is and corrupt if it is 1. 
Let y = (j/i, j/2, ■ ■ ■ , Vn) be the input to the decoder and 
let x' = (x[,x l 2, ■ ■ ■ ,x l n ) be the output vector at the I th 
iteration. Let F(y) denote the set of variable nodes that are 
not eventually correct. 

Definition 1: For transmission over the BSC, y is a fixed 
point of the decoding algorithm if supp(y) = supp(x') for all 
I. If F(y) 7^ and y is a fixed point, then F(y) = supp(y) 
is a fixed set. A fixed set (trapping set) is an elementary fixed 
set (trapping set) if all check nodes in its induced subgraph 
have degree one or twc|^] Otherwise, it is a non-elementary 
fixed set (trapping set). 

Theorem 2 ( Let C be an LDPC code with d v -left- 

regular Tanner graph G. Let T be a set consisting of variable 
nodes with induced subgraph I. Let the check nodes in I be 

2 The all-zero-codeword assumption can be applied if the channel is output 
symmetric and the decoding algorithms satisfied certain symmetry conditions 
(see Definition 1 and Lemma 1 in |35| ). The Gallager A/B algorithm, the bit 
flipping algorithms and the SPA all satisfy these symmetry conditions. 

3 This classification was given in |36|. 



partitioned into two disjoint subsets; O consisting of check 
nodes with odd degree and E consisting of check nodes 
with even degree. Then T is a fixed set for the bit flipping 
algorithms (serial or parallel) iff : (a) Every variable node in 
I has at least \^f~\ neighbors in E and (b) No [if J + 1 check 
nodes of O share a neighbor outside I. 

Note that Theorem [2] only states the conditions for the bit 
flipping algorithms. However, it is not difficult to show that 
these conditions also apply for the Gallager A/B algorithm. 

Although it has been rigorously proven only that fixed sets 
are trapping sets for the Gallager A/B algorithm and the bit 
flipping algorithms on the BSC, it has been widely recognized 
in the literature that the subgraphs of these combinatorial 
objects greatly contribute to the error floor for various iterative 
decoding algorithms and channels. The instanton analysis 



performed by Chilappagari et al. in [29] suggests that the 
decoding failures for various decoding algorithms and chan- 
nels are closely related and subgraphs responsible for these 
failures share some common underlying topological structures. 
These structures are either trapping sets for iterative decoding 
algorithms on the BSC, of which fixed sets form a subset, 
or larger subgraphs containing these trapping sets. Dolecek et 
al. in 1 37 1 defined the notion of absorbing sets, which is very 
similar to the notion of fixed sets. By hardware emulation, they 
found that absorbing sets are the main cause of error floors 
for the SPA on the AWGNC. Various trapping sets identified 
by simulation (for example those in (38) , (39]) are also fixed 
sets. 

From these observations, it is expected that an LDPC code 
will have low error floor performance if the corresponding 
Tanner graph does not contain subgraphs induced by fixed 
sets. However, it is impossible to construct an LDPC code 
whose Tanner graph is free of all fixed sets when the length 
of the code is finite. It is also well-known that imposing 
constraints on a Tanner graph reduces the rate of a code. 
Clearly, only subgraphs of some fixed sets can be avoided 
in the code construction. These need to be chosen carefully in 
order to obtain the best possible error floor performance while 
maximizing the code rate. 

Before one can attempt to determine the fixed sets to forbid 
in the Tanner graph of a code, there are two important issues 
that need to be addressed. First, a complete list of non- 
isomorphic fixed sets (up to a proper size) for a given set 
of code parameters (e.g., column weight and row weight) 
is needed. This is because the notion of an (a, b) fixed set 
(trapping set) is not sufficient. Given a pair of positive integers 
(a, b), there are possibly many fixed sets which induce non- 
isomorphic subgraphs containing a variable nodes and b odd 
degree check nodes. Second, the topological relations among 
subgraphs induced by fixed sets needs to be explored. The 
importance of these relations is threefold. First, the subgraph 
induced by a fixed set may be contained in the subgraph 
induced by another fixed set. In such case the absence of 
one subgraph yields to the absence of the other. Second, 
these relations help reduce the complexity of the search for 
subgraphs in a Tanner graph. Finally, these relations reduce 
the complexity of the analysis to determine the harmfulness 
of subgraphs. 
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In the next subsection, we present our database of fixed sets 
for regular column-weight-three LDPC codes with emphasis 
on the topological relations among them. For the sake of 
simplicity, we drop the term fixed sets and refer to these 
objects by the general term trapping sets. 

B. Trapping Set Ontology of Column-Weight-Three Codes for 
the Gallager A/B Algorithm on the BSC 

1) Graphical representation: The induced subgraph of a 
trapping set (or any set of variable nodes) is a bipartite 
graph. In the Tanner graph (bipartite graph) representation 
of a trapping set, we use • to represent variable nodes, 
■ to represent odd degree check nodes and □ to represent 
even degree check nodes. There exists an alternate graphical 
representation of trapping sets which allows their topological 
relations to be established more conveniently. This graphical 
representation is based on the incidence structure of lines and 
points. In combinatorial mathematics, an incidence structure 
is a triple (^,«Sf, J?) where & is a set of "points", _£f is a 
set of "lines" and J 1 C x Jz? is the incidence relation. 
The elements of J" are called flags. If (p, C) € ,f, we 
say that point p "lies on" line C. In this lines and points 
(henceforth line-point) representation of trapping sets, variable 
nodes correspond to lines and check nodes correspond to 
points. A point is shaded black if it has an odd number of 
lines passing through it, otherwise it is shaded white. An (a, b) 
trapping set is thus an incidence structure with a lines and b 
black shaded points. To differentiate among (a, b) trapping sets 
that have non-isomorphic induced subgraphs when necessary, 
we index (a, b) trapping sets in an arbitrary order and assign 
the notation (a, b){i} to the (a, b) trapping set with index i. 

Depending on the context, a trapping set can be understood 
as a set of variable nodes in a given code with a specified 
induced subgraph or it can be understood as a specific sub- 
graph independent of a code. To differentiate between these 
two cases, we use the letter T to denote a set of variable nodes 
in a code and use the letter T to denote a type of trapping 
set which corresponds to a specific subgraph. If the induced 
subgraph of a set of variable nodes T in the Tanner graph of 
a code C is isomorphic to the subgraph of T then we say that 
T is a T trapping set or that T is a trapping set of type T. C 
is said to contain T trapping set(s). 

Example 3: The (5, 3){1} trapping set 71 is a union of a 
six cycle and an eight cycle, sharing two variable nodes. The 



v, c, V, 



C„ V, 



Tanner graph representation of 71 is shown in Fig. 1 a) The 
set of odd degree check nodes is {ct,cs,cq}. These check 
nodes are represented by black shaded squares. In the line- 
point representation of 71 which is shown in Fig. 1 b) C7,cg 



and eg are represented by black shaded points. These points 
are the only points that lie on a single line. The five variable 
nodes V\, V2, ■ ■ ■ , V5 are represented by black shaded circles 
in Fig. |l||a)| They correspond to the five lines in Fig. |l|fb)| As 
an example, the column-weight-three MacKay random code 
of length 4095 [40| has 19617 sets of variable nodes whose 
induced subgraphs are isomorphic to the subgraph of 71. These 
sets of variable nodes are (5,3){1} trapping sets. 

Remark: To avoid confusion between the graphical repre- 
sentations of trapping sets, we note that the Tanner graph rep- 





(b) 



Fig. 1. Graphical representation of the (5,3){1} trapping set: |(a)| Tanner 
graph representation, |(b)| Line-point representation. 



resentation of a trapping set always contains □ or ■. The line- 
point representation never contains □ or ■. In the remainder 
of this paper, we only use the line-point representation. 

2) Topological relation: The following definition gives the 
topological relations among trapping sets. 

Definition 2: A trapping set T2 is a successor of a trapping 
set 71 if there exists a proper subset of variable nodes of 71 
that induce a subgraph isomorphic to the induced subgraph 
of 71- If 72 is a successor of 71 then 71 is a parent of 71 • 
Furthermore, 72 is a direct successor of 71 if it does not have 
a parent 71 which is a successor of 71- 

The topological relation between 71 and 71 is solely dictated 
by the topological properties of their subgraphs. In the Tanner 
graph of a code C, the presence of a trapping set Ti does not 
indicate the presence of a trapping set T2. If Ti is indeed a 
subset of a trapping set T2 in the Tanner graph of C then we 
say that Ti generates T2, otherwise we say that Ti does not 
generate T 2 . 

3) Family tree of trapping sets: Theorem [2] implies that 
every trapping set T contains at least a cycle. To show this, 
assume that T is a trapping set that does not contain a cycle 
then the induced subgraph of 7" is a tree. Take any variable 
node as the root of the tree then the variable nodes which 
are neighboring to the leaf nodes with largest depth have only 
one check node with degree greater than 1. Therefore these 
variable nodes have no less odd degree check nodes than even 
degree check nodes. This indicates that 7~ is not a trapping 
set, which is a contradiction. Consequently, all trapping sets 
can be obtained by adjoining variable nodes to cycles. Note 
that any cycle is a trapping set for regular column-weight-three 
codes. 





(a) 



(b) 



Fig. 2. Obtaining (5, 3) trapping sets by adding a line to the (4, 4) trapping 



by a < 

set: |(a)| the (5,3){1} trapping set and |(b)| the (5, 3){2} trapping set 

We now explain how larger trapping sets can be obtained 
by adjoining variable nodes to smaller trapping sets. We begin 
with the simplest example: the evolution of (5, 3) trapping 
sets from the (4,4) trapping set for regular-column-weight 
three codes. We know that compared to the (4, 4) trapping set, 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY - AUGUST 2010. 



8 



which is an eight cycle, a (5, 3) trapping set has one additional 
variable node. Therefore, if a (5, 3) trapping set is a successor 
of the (4, 4) trapping set, then its line-point representation can 
be obtained by adding one additional line to the line-point 
representation of the (4, 4) trapping set. Since it is required 
that the addition of the new line preserves the variable node 
degree (or number of points lying on a line), there must be 
exactly three points lying on the new line. Therefore, we can 
consider the process of adding a new line as the merging of 
at least one point on the new line with certain points in the 
line-point representation of the (4, 4) trapping set. We use 
© to denote points on the line that are to be merged with 
points in the line-point representation of the parent trapping 
set. The merging is demonstrated in Fig. [2] and is explained 
as follows. If a black shaded point is merged with a © point 
then they become a single white shaded point. Similarly, if a 
white shaded point is merged with a © point then the result is 
a single black shaded point. Recall that there must be exactly 
three black shaded points in the line-point representation of a 
trapping set. In addition, every line must pass through at least 
two white shaded points. The only way to satisfy these two 
conditions is to merge two points of the new line with two 
black shaded points of the (4, 4) trapping set. There are two 
distinct ways to select two black shaded points, resulting in 
two different (5,3) trapping sets. 

The evolution of a trapping set for regular-column-weight 
three codes from its parent can now be described in a more 
general setting. Since every trapping set of interest is a direct 
successor of some trapping sets, it is sufficient to only consider 
the evolution of direct successors. Consider an (a, b) trapping 
set 71 . Since 71 has a variable nodes, its line-point representa- 
tion contains a lines. Each line has 3 points lying on it, with at 
most one point shaded black. There are b black shaded points, 
each has an odd number of lines passing through it. The line- 
point representation of an (a + u, b + z) trapping set Ti can 
be obtained by adding u lines to the line-point representation 
of 71. These u new lines (and the points on them) form an 
incidence structure and since Ti is a direct successor of 71, 
this incidence structure is connectecQ For simplicity, let us 
only consider elementary trapping sets. Then it can be shown 
that the incidence structure formed by the new u lines can 
only be one of those listed in Fig. [3] A successor trapping set 
72 is obtained by pairwisely merging the © points with certain 
points of 71- We remark that non-elementary successors can 
be obtained in a very similar process with small additional 
complexity. 

4) Example: Let us consider regular column-weight-three 
LDPC codes. For simplicity, we only consider codes of girth 
g = 8 and elementary trapping sets, although this example 
can be generalized to include codes of other girths and non- 
elementary trapping sets. 

• With the evolution of the (5, 3){2} trapping set presented 
above, we show the family tree of (a, b) trapping sets 
originating from the (5, 3) {2} trapping set with a < 8 
and b > in Fig. [5] 



(a) (b) (c) 






(f) 



(h) 



(i) 



Fig. 3. Possible incidence structures formed by u new lines for elementary 
trapping sets. 




(a) (b) (c) 

Fig. 4. Obtaining larger a trapping set by adding lines to a smaller one. 



By selecting two black shaded nodes in Fig. |q^a)| and 
merging them with two © nodes in Fig. ^ c) a (6,4) 
trapping set can be obtained. Two distinct ways to select 
black shaded nodes result in two different trapping sets: 
the (6,4){1} trapping set shown in Fig. 



a) 



and the 

(6, 4){2} trapping set shown in Fig. t b) The merging is 
demonstrated in Fig. ^ a) and (b) The family tree of (a, b) 
trapping sets originating from the (6, 4){1} trapping set 
with a < 8 and b > is illustrated in Fig. [7] 
• In the same manner, other direct successors of the (4, 4) 
trapping set can be generated. Those (a, b) trapping sets 
with a < 8 and b > are shown in Fig. [6] For a more 
complete list of trapping sets from the TSO, the interested 
readers are referred to (ST). 

Remarks: A trapping set may originate from different par- 
ents. For example, the (6, 4) {2} trapping set is not only a 
direct successor of the (4, 4) trapping set but also a direct suc- 
cessor of the (5, 5) trapping set. The evolution of the (6, 4){2} 
trapping set from the (5,5) trapping set is demonstrated in Fig. 

m 

5) Codewords: Let y be a codeword of C and let T = 
supp(y). It is clear that T is an (a, 0) trapping set where 
a = |supp(y)|. Conversely, C contains codewords of Ham- 



P 9 9 

it it 

O O 6 




(a)(5,3){2} fb)(6,2){l} (c) (7, 1){1} (d)(8,2){l} 




O— 6 O 



o— •— — o 












fe)(7,3){l} (f)(8,2){2} (g)(8,4){l} 



4 Each incidence structure corresponds to a bipartite graph. An incidence 
structure is connected if the corresponding bipartite graph is connected 



Fig. 5. The (5,3){1| trapping set and its successors of size less than 8 in 
girth 8 LDPC codes. 
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(a) (4,4) (b) (6,4){2} (c) (7,5){1} (d)(7,5){2} 






6 



-6 



(e) (8,6){1} 



(f) (8,6){2} 



(g) (8,6){3} 



Fig. 6. The (4, 4) trapping set and its direct successors of size less than 8 
in girth 8 LDPC codes (excluding the (5,3){1} and (6, 4){1} trapping sets 
shown in Fig. |l|[b)]and|^[aj] respectively). 






(a) (6,4){1} 



(b) (7,3){2} 



(c) (8,2){3} 




(g) (8,4){2} 



(h) (8,4){3} 



(i) (8, 4) {4} 



Fig. 7. The (6, 4){1} trapping set and its successors of size less than 8 in 
girth 8 LDPC codes. 



ming weight a if the Tanner graph of C contains (a, 0) trapping 
sets. It is also clear that C has d m i n as its minimum distance 
if and only if (i) the Tanner graph of C contains no (a, 0) 
trapping set where a < d ni i n and (ii) the Tanner graph of 
C contains at least one (d m i n , 0) trapping set. For regular 
column-weight-three codes, an (a, 0) trapping set is a direct 
successor of an (a — 1, 3) trapping set. Consequently, the line- 
point representation of an (a, 0) trapping set is obtained by 
pairwisely merging three black shaded nodes in the line-point 
representation of an (a — 1, 3) trapping set with three © nodes 
in Fig. 



a) The line-point representations of all possible (a, 0) 



trapping sets where a < 10 of girth 8 codes are shown in Fig. 

E 

V. Searching for Subgraphs in a Tanner Graph 

In this section, we briefly describe the main idea behind 
our techniques of searching for elementary trapping sets from 
the TSO in the Tanner graph of a regular column-weight- 
three LDPC code. An efficient search of the Tanner graph 
for trapping sets relies on the topological relations among 
trapping sets defined in the TSO and/or carefully analyzing 
their induced subgraphs. Trapping sets are searched for in a 
way similar to how they have evolved in the TSO. A bigger 
trapping set can be found in a Tanner graph by expanding a 
smaller trapping set. More precisely, given a trapping set Ti of 
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O 6 6 




(a) (6,0){1} 



(b) (8,0){1} 



(c) (8,0){2} 






(d) (10,0){1} 



(e) (10,0){2} 



(f) (10,0){3} 






(g)(10,0){4} (h)(10,0){5} (i)(10,0){6} 

Fig. 8. All (a, 0) trapping sets where a < 10 in girth 8 LDPC codes. 



type 71 in the Tanner graph of a code C, our techniques search 
for a set of variable nodes such that the union of this set with 
Ti form a trapping set T2 of type T2, where 72 is a successor 
of 71- Our techniques are sufficient to efficiently search for a 
large number of trapping sets in the TSO, especially for those 
to be avoided in the code constructions that we will present 
in subsequent sections. They can be easily expanded to search 
for other trapping sets as well. Details on the implementation 
of these techniques are given in Appendix [A] 

It is necessary to mention existing methods of searching for 
trapping sets in the Tanner graph of a code. It is well-known 
that this problem is NP hard pT) , [42 1 . Previous work on 
this problem includes exhaustive |43|, |44| and non-exhaustive 
approaches p3| , p6| . The main drawback of existing exhaus- 
tive approaches is high complexity. Consequently, constraints 
must be imposed on trapping sets and on the Tanner graph 
in which trapping sets are searched for. For example, the 
method in | |43) can only search for (a, b) trapping sets with 
b < 2 in a Tanner graph with less than 500 variable nodes. 
The complexity is much lower for non-exhaustive approaches. 
However, these approaches can not guarantee that all trapping 
sets are enumerated, and hence are not suitable for the purpose 
of this paper. 

VI. Construction of Codes Free of Small 
Trapping Sets 

Let us begin this section by summarizing the paper up until 
this point. We have given the description for a general class 
of LDPC codes whose parity check matrices are arrays of 
permutation matrices obtained from Latin squares. We have 
also presented our database of trapping sets of regular column- 
weight-three codes for the Gallager A/B algorithm on the 
BSC. Subgraphs of these trapping sets are identified by many 
researchers as the main cause of error floor for various iterative 
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decoding algorithms and channels. Methods of searching for 
these subgraphs in the Tanner graph of a code have also been 
presented. We therefore have all theoretical tools necessary to 
proceed to code construction. 

In this section, we give a general method to construct regular 
LDPC codes free of a given collection of trapping sets. More 
precisely, codes are constructed so that their Tanner graphs 
are free of a given collection of subgraphs from the TSO. 
Therefore, in this context, an (a,b){i} trapping set should be 
understood as a specific subgraph and not as a set of non- 
eventually correct variable nodes. It is important to note that 
our method of constructing codes free of small trapping sets 
can be applied to any class of codes, and not just the new class 
of codes proposed in this paper. For example, the progressive 
edge growth (PEG) method p7) can be modified to construct a 
random code whose Tanner graph is free of certain subgraphs. 
Similarly, the method of progressively constructing a Tanner 
graph described below can be applied to construct any code 
whose parity check matrix is an array of permutation matrices. 
However, we restrict ourselves to construct codes defined in 



Section III in this paper, for the sole purpose of demonstrating 
the excellent behavior of this newly proposed class of codes. 

We organize our discussion by considering two separate 
problems: determining a collection of forbidden subgraphs, 
i.e., which subgraphs that should be avoided in the Tanner 
graph and (ii) constructing a Tanner graph which is free of a 
given collection of subgraphs. Let us begin with the second 
problem. 

A. Construction of a Code by Progressively Building the 
Tanner Graph 

We give a progressive construction of a (d v ,d c ) regular 
LDPC code whose parity check matrix is an array of permu- 
tation matrices. Our construction algorithm is inspired by the 
PEG algorithm ||47) and the method in Jl3). Let C be a (7, p) 
regular LDPC code whose parity check matrix H = /(W) 
is an array of permutation matrices. The condition that a 
Tanner graph is free of a given collection of subgraphs can 
be understood as a set of constraints imposing on such Tanner 
graph. Assume that the Tanner graph G corresponding to H 
is required to satisfy a set of constraints. Let r denote this set 
of constraints. 

The construction is based on a check and select-or-disregard 
procedure. The Tanner graph of the code is built in p stages, 
where p is the row weight of % (p is the number of columns of 
W). Usually, p is not pre-specified, and a code is constructed 
to have the rate as high as possible. Determining the exact 
rate is beyond the scope of this paper. At each stage, a set 
of \Q\ new variable nodes are introduced that are initially not 
connected to the check nodes of the Tanner graph. Blocks 
of edges are then added to connect the new variable nodes 
and the check nodes. Each block of edges corresponds to a 
permutation matrix and hence corresponds to an element of Q. 
An element of Q may be chosen randomly, or it may be chosen 
in a predetermined order. After a block of edges is tentatively 
added, the Tanner graph is checked for condition r. If the 
condition r is violated, then that block of edges is removed 



and replaced by a different block. The algorithm proceeds until 
no block of edges can be added without violating condition 
t. Details of the construction is given in Algorithm [T] For 
mathematical convenience, we append a symbol ip to the 
quasi group Q and define f(ip) — Z, the all zero matrix of 
dimension |Q| x \Q\. Also let "J be a 7 x 1 matrix whose all 
elements are ijj, where 7 is the column weight of the code to 
be constructed. 

Algorithm 1 Progressively Building the Tanner Graph 



W <— 7 x 1 all zero matrix; p 
while w ltP ^ V do 

while S ^ & i < 7 do 



1 



<- 1; p «- p + 1; 



if /(W) satisfies r then 

i <- i + 1; 
end if 

end while 
end while 

p <— p — 1; Delete the last column of W; 



The complexity of the algorithm grows exponentially with 
the column weight. The speed of practical implementation of 
the algorithm also depends strongly on how the condition r 
is checked on a Tanner graph. However, for small column 
weights, say 3 or 4, and small to moderate code lengths, the 
algorithm is well handled by state-of-the art computers. For 
example, with the searching techniques described in Section 
[V] the construction of a (1111,808) code which has girth 8, 
minimum distance at least 10 and which contains no (5, 3){2} 
trapping set takes less than 2 minutes on a 2.4 GHz computer. 

Remarks: 

• It is worth mentioning an alternative approach in which 
a subgraph is described by a system of linear equations. 
Elements of a given matrix W are particular values of 
variables of these systems of equations. The Tanner graph 
corresponding to f(W) contains the given subgraph if 
and only if elements of W form a proper solution of at 
least one of these linear systems of equations. For array 
LDPC codes, equations governing cycles and several 
small subgraphs have been derived in p"4) and |37|. 
However, the problem of finding W such that its elements 
do not form a proper solution of any of these systems of 
equations is notoriously difficult. 

• The above code construction can be alternatively de- 
scribed as a process of progressively constructing an 
incidence structure. The construction begins with an 
incidence structure consisting of points with no lines. 
Blocks of parallel lines are then added based on a check 
and select-or disregard procedure, similar as in [12] and 



G3- 



B. Determining the Collection of Forbidden Subgraphs 

Now we give a general rationale for deciding which trapping 
sets should be forbidden in the Tanner graph of a code. As 
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previously mentioned, these trapping sets are chosen from 
the TSO. It is clear that if a parent trapping set is not 
present in a Tanner graph, then neither are its successors. 
Since the size of a parent trapping set is always smaller than 
the size of its successors, a code should be constructed so 
that it contains as few small parent trapping sets as possible. 
However, forbidding smaller trapping sets usually imposes 
stricter constraints on the Tanner graph, resulting in a large 
rate penalty. This trade off between the rate and the choice of 
forbidden trapping sets is also a trade off between the rate and 
the error floor performance. While an explicit formulation of 
this trade off is difficult, a good choice of forbidden trapping 
sets requires the analysis of decoder failures to reveal the 
relative harmfulness of trapping sets. It has been pointed out 
that for the BSC, the slope of the frame error rate (FER) 
curve in the error floor region depends on the size the smallest 
error patterns uncorrectable by the decoder [16]. We therefore 
introduce the notion of the relative harmfulness of trapping 
sets in a general setting as follows. 

Assume that under a given decoding algorithm, a code is 
capable of correcting any error pattern of weight ■& but fail 
to correct some error patterns of weight $ + 1. If the failures 
of the decoders on error patterns of weight $ + 1 are due to 
the presence of (oi, b{) trapping sets of type 71 then 71 is the 
most harmful trapping set. Let us now assume that a code is 
constructed so that it does not contain 71 trapping sets and is 
capable of correcting any error pattern of weight i? + 1. If the 
presence of (et2, 62) trapping sets of type 72 leads to decoding 
failure on some error patterns of weight d + 2 then 72 is the 
second most harmful trapping sets. The relative harmfulness 
of other trapping sets are also determined in this manner. 

Example 4: Let us consider a regular column-weight-three 
LDPC code of girth 8 on the BSC and assume the Gallager 
A/B decoding algorithm. Since such a code can correct any 
error pattern of weight two, we want to find subgraphs whose 
presence leads to decoding failure on some error patterns of 
weight three. Since a code can not correct three error if its 
Tanner graph either contain (5, 3){2} trapping sets or contain 
(8, 0){1} trapping sets, the most harmful trapping sets are the 
(5, 3){2} trapping set and the (8,0){1} trapping set. 

To further explain the importance of the notion of relative 
harmfulness, let us slightly detour from our discussion and 
revisit the notion of a trapping set. A trapping set is defined 
as a set of variable nodes that are not eventually correct. 
Because trapping sets are defined in this way, it is indeed 
possible, in some cases, to identify some small trapping 
sets in a code by simulation, assuming the availability of a 
fast software/hardware emulator. Unfortunately, trapping sets 
identified in this manner generally have little significance 
for code construction. This is because the dynamic of an 
iterative decoder (except the Gallager A/B decoder on the 
BSC) is usually very complex and the mechanism by which 
the decoder fails into a trapping set is difficult to analyze and is 
not well understood. In many cases, the subgraphs induced by 
sets of non-eventually correct variable nodes are not the most 
harmful ones. For example, the (6,4) trapping sets shown in 



presented in Section [VIII| indicates that they are not the most 
harmful ones. Although avoiding subgraphs induced by sets 
of non-eventually correct variable nodes might lead to a lower 
error floor, the code rate may be excessively reduced. A better 
solution is to increase the slope of the FER curve with the 
fewest possible constraints on the Tanner graph. This can only 
be done by avoiding the most harmful trapping sets. 

Nevertheless, except for the case of the Gallager A/B 
algorithm on the BSC in which the relative harmfulness of a 
trapping set is determined by its critical number, determining 
the relative harmfulness of trapping sets in general is a difficult 
problem. The original concept of harmfulness of a trapping 
set can be found in early work on LDPC codes as well as 
importance sampling methods to analyze error floors. MacKay 
and Postol (39) were the first to discover that certain "near 
codewords" are to be blamed for the high error floor in the 
Margulis code on the AWGNC. Richardson fT5) reproduced 
their results and developed a computation technique to predict 
the performance of a given LDPC code in the error floor 
domain. He characterized the troublesome noise configurations 
leading to the error floor using trapping sets and described 
a technique (of a Monte-Carlo importance sampling type) to 
evaluate the error rate associated with a particular class of 
trapping sets. Cole et al. [48 1 further developed the importance 
sampling based method to analyze error floors of moderate 
length LDPC codes and we used instantons to predict the error 
floors ||26)-(28). 

The main idea of our method is to determine the relative 
harmfulness of trapping sets from the TSO for the SPA on 
the BSC. It relies on the topological relationship among these 



Fig. t b) and ' a) were identified in [37 1 to be among the most 



dominant trapping sets. However, our analysis which will be 



trapping sets and will be presented in Section VHI Before 
presenting this method, we describe the construction of codes 
for the Gallager A/B algorithm on the BSC in the next section. 

VII. LDPC Codes for the Gallager A/B Algorithm 
on the BSC 

The error correction capability of regular column-weight- 
three LDPC codes has been studied in (24), (25), (49) and 
can be summarized as follows. 

• A column-weight-three LDPC code with Tanner graph of 
girth g cannot correct all g/2 errors. 

• A column-weight-three LDPC code with Tanner graph of 
girth g > 10 can always correct g/2 — 1 errors. 

• A column-weight-three LDPC code with Tanner graph of 
girth g = 6 can correct any two errors if and only if the 
Tanner graph does not contain a codeword of weight four. 

• A column-weight-three LDPC code with Tanner graph 
of girth g = 8 can correct any three errors if and only if 
(i) the Tanner graph does not contain (5,3){2} trapping 
sets and (ii) the Tanner graph does not contain (8,0){1} 
trapping sets. 

The above conditions completely determine the set of con- 
straints r to be imposed on the Tanner graph of a code to 
achieve a given error floor performance. The necessary and 
sufficient conditions to correct three errors were derived in 
1 49 1 . These conditions require that the Tanner graph of the 
code has girth g = 8, and does not contain (5,3) and (8,0) 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY - AUGUST 2010. 



12 




Cross over probability (e) 

Fig. 9. Frame error rate performance of the Tanner code and code Ci under 
the Gallager A algorithm on the BSC with maximum 50 iterations. 



trapping set. It is obvious that the (5, 3) is indeed the (5, 3){2}. 
The (8,0) trapping set should be understood as the (8,0){1} 
since it can be shown easily that the critical number of the 
(8, 0){2} is four. In the following example, we present the 
construction of a code which can correct three errors. 

Example 5 (Codes that can correct 3 errors): The Tanner 
code of length 155 [6| is a (3,5) regular LDPC code. This 
code contains (5, 3){2} trapping sets and hence can not correct 
three errors under the Gallager A/B algorithm on the BSC. Let 
q = 31 and a be a primitive element of GF(g). Let C\ be an 
LDPC code defined by the parity check matrix % = f(Ui) 
where 

U x = 

C\ is a (155, 64) LDPC code with girth 5 = 8 and minimum 
distance d m - ln = 12. The Tanner graph of C\ contains no 
(5, 3){2} trapping sets. Therefore, C\ is capable of correcting 
any three error pattern under the Gallager A/B algorithm on 
the BSC. The FER performance of C\ under the Gallager 
A/B algorithm over the BSC is shown in Fig. [9] The FER 
performance of the Tanner code is also shown for comparison. 

We end this section with a remark on the harmfulness of 
two trapping sets with the same critical number. If two types 
of trapping sets 71 and 75 have the same critical number ■&, 
then the one with the larger number of inducing set of size # 
is more harmful. An inducing set of a trapping set is a set of 
variable nodes such that if these variable nodes are initially in 



error then the decoder will fail on the trapping set (see [ 30 1 
for a more detailed discussion). 

VIII. LDPC Codes for the SPA on the BSC 

In this section, we present the construction of regular 
column-weight-three codes for the SPA on the BSC. The main 
element of the construction is the determination of the set of 
most harmful trapping sets. Following the discussion of the 



notion of relative harmfulness in Section |VI-B| we approach 
this problem as follows. 

Let us consider an LDPC code C and assume that C can 
correct any error patterns of weight -d under the SPA on the 
BSC. We are interested in determining the trapping sets whose 
presence leads to decoding failure on error patterns of weight 
$+1. To simplify this problem, we only focus on initial error 
patterns of weight 1? + 1 that surely lead to decoding failures 
of the Gallager A/B algorithms on the BSC. The basis for this 
simplification is as follows. Since it is well-known that the 
SPA algorithm has a superior performance in both the waterfall 
and error floor regions compared to that of the Gallager A/B 
algorithm, we surmise that an error pattern correctable by the 
Gallager A/B algorithm is correctable with high probability 
by the SPA algorithm, although this fact remains unproven. 
The initial error patterns of weight 1} + 1 that are surely 
uncorrectable by the Gallager A/B algorithm can be easily 
derived from the TSO. 

Assume the transmission of an all zero codeword and let y 
be the received vector input to the decoder. Also assume that 
supp(y) = Ti, a trapping set of type 71 from the TSO with 
■d + 1 variable nodes. In other words, all the 1? + 1 initially 
corrupt variable nodes belong to the trapping set Ti. This 
error pattern results in a decoding failure of the Gallager A/B 
algorithm and hence is an initial error pattern of interest. As 
the decoder operates by passing messages along edges of the 
Tanner graph, the decoding outcome depends heavily on the 
immediate neighborhood of the subgraph induced by variable 
nodes in TV In many cases, a decoding failure will occur if 
Ti generates a trapping set T2 of type 75, where 75 is a 
successor of 71. In such cases, the presence of T2 in a code 
make it incapable of correcting any error pattern of size $ + 1 
and hence T 2 is a harmful trapping set. 

To evaluate the harmfulness of the 75 trapping set, all initial 
error patterns that consist of variable nodes of a 71 trapping 
set must be considered. Let 5" be the set of all trapping sets of 
type 71. Partition & into two disjoint sets 51 and 55 such that 
a trapping set in 51 generates at least one 75 trapping set while 
a trapping set in 55 does not generate any 75 trapping set. 
For each trapping set Ti £ 5", perform decoding on the input 
vector yi where supp(yi) = Ti, at a cross over probability e 
of the channel. Let 5^ be the set of trapping sets Ti £ 5" such 
that decoding is successful upon error pattern yi. Define Xi(e) 
and X2( e ), the rate of successful decoding for trapping sets in 
51 and 55 at the cross over probability e of the channel, as 
follows 



Xi(e) 
X2(e) 



l^i n %\ 

m 

1551 



(7) 

(8) 



The harmfulness of 75 trapping sets of C is evaluated by 
comparing Xi( e ) an d X2(e) for a wide range of e. The larger 
the difference X2(c) — Xi( e )' tne more harmful 75 trapping sets 
are. The harmfulness of 75 trapping sets is also compared with 
the harmfulness of other successor trapping sets of 71, which 
is determined in the same fashion. Once the most harmful 
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Fig. 10. Hierarchy of trapping sets originating from the (4,4) trapping set 
for regular column-weight-three codes of girth 8. 



trapping sets have been determined, a code is constructed so 
that it does not contain these trapping sets. 

We note that this characterization of relative harmfulness, 
although heuristic, plays a critical role in the construction 
of good high rate codes as no explicit quantification of 
harmfulness of trapping sets is known. This characterization 
of harmfulness also helps a code designer to determine more 
or less the exact subgraphs that are responsible for a certain 
type of decoding failure. It is therefore superior to searching 
for trapping sets by simulation. 

We continue our discussion with three case studies in which 
we evaluate (i) the relative harmfulness of the (6, 2){1} and 
(8, 2) trapping sets, (ii) the relative harmfulness the (5, 3){2} 
trapping set and (iii) the relative harmfulness of the (7, 3), 
(9, 3) and (10, 2) trapping sets. For a better illustration of the 
relationship among these trapping sets, a hierarchy of trapping 
sets originating from the (4, 4) trapping set is shown in Fig. 



10 For the first case, we present a detailed analysis. For the 
other two cases, we only give the results of the analysis. 
The analysis to be presented is a step towards the guaranteed 
correction of four, five and six errors under the SPA on the 
BSC. For simplicity, we assume that codes have girth g = 8 
in all examples, although the method of construction can be 
applied to girth 6 codes to likely result in higher rate codes. 

A. The Harmfulness of the (6, 2){1} and (8, 2) Trapping Sets 

Since we consider codes with girth g = 8, let us start with 
an existing code of such girth. Consider the (530, 373) integer 
lattice code (or shortened array code p"4) ) given in JTJ) . This 



code has minimum distance d m i n = 8 and hence is unable 
to correct all weight-four error patterns. Clearly, the first step 
towards the guaranteed correction of four errors is to eliminate 
the (6, 0){1}, (8, 0){1} and the (8, 0){2} trapping sets, which 
are the low weight codewords. We therefore construct a code 
with minimum distance d mm > 10. Let q = 53 and a be a 
primitive element of GF(q) and let r specify that the Tanner 
graph of a code has girth g = 8 and contain no (6, 0){1}, 
(8, 0){1} and (8,0){2} trapping sets. Using the method of 



construction described in Section VI-A we obtain a regular 
column-weight-three code C2 with parity check matrix H2 = 
]{U-i) where 



Hi 



1 



a 



„27 



,10 



„13 



C2 is a (530, 373) code. Similar to the above integer lattice 
code, C 2 nas column weight 3, row weight 10 and rate R = 
0.7. 

The Tanner graph of C2 contains 17066 (4,4) trapping 
sets. We partition the collection of (4, 4) trapping sets into 
nine disjoint sets 3\, $2, . . . , S7% based on whether a (4, 4) 
trapping set generate (5,3){2}, (6,2){1}, (8,2) or (10,0) 
trapping sets. Note that, for simplicity, we do not differentiate 
among different (8, 2) and (10, 0) trapping sets in this analysis, 
although a more detailed treatment may reveal some difference 
in the harmfulness of those trapping sets. The classification 
and sizes of different sets of (4, 4) trapping sets are shown in 
Table U 

TABLE I 

Disjoint Sets of (4, 4) trapping sets in the LDPC code C2. A 

/INDICATES THAT THE (4, 4) TRAPPING SETS IN ,% GENERATE AT LEAST 
ONE CORRESPONDING TRAPPING SET. 



Sets .% 


Trapping Sets Generated by 5; 


Total 


(5,3){2| 


(6, 2){1> 


(8,2) 


(10,0) 


9i 










4982 


?i 








/ 


53 


$3 


/ 








424 








/ 




7314 


% 






/ 


/ 


371 


^6 


/ 




/ 




1855 




/ 




/ 


/ 


106 




/ 


/ 






1007 




/ 


/ 


/ 




954 


Total 


17066 



To evaluate the harmfulness of the (5,3){2}, (6, 2){1}, 
(8, 2) and (10, 0) trapping sets, we perform decoding on all 
input vectors yi where supp(yi) = Ti, a (4, 4) trapping set of 
C2. The result is as follows. For the trapping sets in 3?i, 
$2 and the decoder successfully decodes all input vectors 
yi at all the 250 values of e that have been considered, i.e. 
Xi(e) = X2(e) = X3<» = Xr(e) = 100% Ve. For the trapping 
sets in 5^, ^5, and the rate of successful decoding 

is shown in form of histogram in Fig. 



11 



As an example of 
how to interpret the result, consider the trapping sets in J^. 
It can be seen that there are about 160 values (65%) of e at 
which decoding is succesful for all input vectors yi. For about 
90 values (30%) of e, decoding is succesful for approximately 
nine out of ten input vectors yi. 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY - AUGUST 2010. 



14 



250 



200 



150 



S 100 



50 







10 



i = 4 
i = 5 
i = 6 
i = 8 
i = 9 



1 



20 



I 1 



30 



40 



50 



60 



70 



80 



90 



100 



Fig- 



Successful decoding rate x»( e ) (%) 
11. The rate of successful decoding for different sets of (4,4) trapping sets in code C2 



The following facts can be observed: 

• The (4,4) trapping sets in 3*1,3% and 3\\ do not gen- 
erate either (6, 2){1} or (8, 2) trapping sets. The rate of 
successful decoding is 100% for all tested values of e. 

• The (4, 4) trapping sets in 3§ and 3% generate at least 
one (8, 2) trapping set. Decoding is not always successful, 
but the rate of successful decoding is more than 90% for 
all tested values of e. 

• The (4, 4) trapping sets in 3% generate at least one 
(6, 2){1} trapping set. The rate of successful decoding 
Xs(e) is significantly lower in general compared to type 
X4.(e),X5(e) and xe(e)- 

• The (4, 4) trapping sets in 3g generate at least one 
(6, 2){1} and one (8,2) trapping set. The rate of suc- 
cessful decoding Xg(e) is lowest in general. 

• The (4, 4) trapping sets in 3% generate at least one 
(5,3){2} trapping set while the ones in 3± do not. In 
general, xe(e) < X-i(e). 

• The (4, 4) trapping sets in 3$ generate at least one (10, 0) 
trapping set while the ones in 3i do not. In general, 
Xs(e) > X±(e)- 

• The (4, 4) trapping sets in 3j generate at least one (10, 0) 
trapping set while the ones in 3q do not. X7( e ) = 100% 
for all tested values of e. 

The above observations strongly suggest that both (6, 2){1} 
and (8, 2) trapping sets are harmful. However, the harmfulness 
of the (6, 2){1} trapping set is much more evident than the 
harmfulness of the (8, 2) trapping set. Besides, it is interesting 
to notice that xt{ € ) — 100% for all tested values of e. All 
(4, 4) trapping sets in 3j generate at least one (8, 2) trapping 
set, one (5, 3) {2} trapping set and one (10, 0) trapping set. In 
this case, the presence of (10, 0) trapping sets seem to "help" 
decoding. This "positive" effect of (10, 0) trapping sets can 
also be seen when comparing X5( e ) an d Xi( e )- Finally, by 
comparing Xe( e ) an d X4( e )> it i s suggestive that the (5,3){2} 
trapping sets have some negative effect on decoding if the 
(4,4) trapping set generate (8,2) and (10,0) trapping sets. 



To further verify our prediction on the harmfulness of the 
(6, 2){1} and (8,2) trapping sets, we construct another code 
with the same parameters as those of €2- We denote this code 
by C3 . The Tanner graph of C3 has stronger constraints than the 
Tanner graph of C2 as it contains neither (6, 2){1} nor (10, 0) 
trapping sets. Since (10,0) trapping sets are not presented, C 3 
has minimum distance at least 12. 

Let C3 be defined by the parity check matrix H3 = / (W3) 
where 

v 26 „,31 ^33 ^36' 

u> = ' 



1 

, 30 



,10 



o 
a 



.19 



The Tanner graph of C3 contains 16483 (4, 4) trapping sets, 
which can be partitioned into four disjoint sets as shown in 
Table HU 

TABLE II 

Types of (4, 4) trapping sets in the (530, 373) LDPC code C3 



Sets 2?i 


Trapping Sets Generated by S?i 


Total 


(5,3){2| 


(8,2) 


Si 






6890 




/ 




795 






/ 


6890 


Si 


/ 


/ 


1908 


Total 


16483 



We again perform decoding on all input vectors y, where 
supp(yi) = T;, a (4, 4) trapping set of C3. The rate of 
successful decoding for trapping sets in ,% and 3^ is shown 



in form of histogram in Fig. 12 For trapping sets in S/'x and 
3?2, decoding is always successful. 

It can be seen that the results are consistent with the previ- 
ously obtained results. Decoding is always successful for the 
(4, 4) trapping sets which generate neither (6, 2){1} nor (8, 2) 
trapping sets. Besides, X3( e ) > Xi{ e ) m general since the 
(4,4) trapping sets in 3^ do not generate (5,3){2} trapping 
sets. These results validate our prediction on the harmfulness 
of successors of the (4, 4) trapping set. We have repeated the 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY - AUGUST 2010. 



15 



250 




90 95 100 

Successful decoding rate Xii e ) (%) 

Fig. 12. The rate of successful decoding for different sets of (4, 4) trapping 
sets in code C3 



experiment for a collection of codes whose Tanner graphs 
do not contain either (6,2){1} or (8,2) trapping sets. The 
consistency of the results led us to the following conjecture. 

Conjecture 1: A regular column-weight-three code of girth 
g = 8 can correct any error pattern of weight 4 consisting of 
variable nodes of an eight cycle under the SPA on the BSC if 
its Tanner graph contain neither (6, 2) nor (8, 2) trapping sets. 

We remark that this conjecture only gives a sufficient 
condition. A code may correct any error pattern of weight 4 
even if its Tanner graph contains (8, 2){2} trapping sets. For 
example, consider the Tanner code of length 155. The Tanner 
graph of this code does not contain (6,2){1} trapping set, 
but it contains (8, 2) {2} trapping sets. However, decoding is 
always successful for all the (4, 4) trapping sets at any value 
of e. It might be possible to find a better sufficient condition 
by taking into account bigger trapping sets, but such analysis 
appears to be difficult. 

Example 6: The FER performance of C2, C3 and the 
(530, 373) integer lattice code under the SPA with 100 itera- 
tions on the BSC is shown in Fig. [13] For comparison, Fig. [13] 
also shows the FER performance of a (530, 373) LDPC code 
constructed using the PEG algorithm j47j. This PEG code has 
girth 5 = 6 and minimum distance d m i n — 6. Clearly, C3 
whose Tanner graph is free (6, 2) trapping sets, has the best 
performance. Although the Tanner graph of C2 contains some 
(8, 2) trapping sets, it still outperforms the PEG code. The 
integer lattice code has the worst performance although it has 
girth g = 8. 

Example 7: Let q = 3 4 and let C4 be defined by the parity 
check matrix H4 = f(U§) where 



Ua = 



,.55 



* 31 

v46 



* 33 
,,78 



,39 
v37 



,60 
,70 



v 07 



C4 is a (810, 569) code with column weight 3, row weight 
10 and rate R — 0.7. The Tanner graph of C4 has girth g = 8 
and does not contain either (6, 2) or (8, 2) trapping sets. The 
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Fig. 13. Frame error rate performance of codes in Example [6] under the SPA 
on the BSC. 




Cross over probability (e) 

Fig. 14. Frame error rate performance of codes in Example^] under the SPA 
on the BSC. 



FER performance of C4 under the SPA with 100 iterations on 
the BSC is shown in Fig. [14] For comparison, Fig. [14] also 
shows the FER performance of a (810, 567) PEG constructed 
code. This code has girth g = 8. It can be seen that C 4 has a 
lower floor than the PEG code. 

B. The Harmfulness the (5,3){2} Trapping Set 

Assuming the guaranteed correction of four errors, we are 
now interested in finding trapping sets whose presence leads to 
decoding failure on some error patterns of weight five. There 
are two trapping sets with five variable nodes from the TSO 
that can be present in the Tanner graph of a regular column- 
weight-three LDPC code with girth g = 8: the (5,3){2} 
trapping set and the (5, 5) trapping set. The result of our 
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analysis indicates that (5, 3){2} trapping sets are the most 
harmful and should be forbidden in the Tanner graph of a 
code. 

Example 8: Let q — 211 and let C 5 be defined by the parity 
check matrix H5 = /(W5) where U5 is shown in (|9jl. C5 is a 
(3165, 2554) code with column weight 3, row weight 15 and 
rate R — 0.8. The Tanner graph of C5 has girth g = 8 and 
does not contain either (5, 3){2} or (8, 2) trapping sets. The 
FER performance of C 5 under the SPA with 100 iterations on 



the BSC is shown in Fig. 15 For comparison, Fig. 15 also 
shows the FER performance of a (3150, 2520) regular QC 
LDPC code constructed using array masking proposed in B). 
The parity check matrix of this code is a 10 x 50 array of 
63 x 63 circulants or zero matrices, which has column weight 
3 and row weight 15. This code has girth g = 8. It can be 
seen that C4 has a lower error floor than the code constructed 
using array masking. 

C. The Hannfulness of (7, 3), (9, 3) and (10, 2) Trapping Sets 

With the above results, we now consider codes free of 
(5, 3){2} and (8,2) trapping sets and aim for the guaranteed 
correction of six errors. To guarantee the correction of six 
errors, codes must have minimum distances d m i n > 14. In 
other words, their Tanner graphs should be free of (a, 0) 
trapping sets Va < 12. Similar to the previous discussions, 
we analyze error patterns of weight six, focusing on those 
consisting of variable nodes of a trapping set. There are three 
trapping sets of size 6 from the TSO that can be present in the 
Tanner graph of a regular column-weight-three LDPC code 
with girth g = 8: the (6,6) trapping set, (6,4){1} trapping 
set and the (6,4){2} trapping set. The results of our analysis 
and experiments suggest that the following trapping sets are 
harmful (in a decreased order of harmfulness): 

• The (7, 3) {2} trapping set and the (7, 3) {3} trapping set. 

• The (10, 2) trapping sets which are successors of the 
(9, 3) trapping sets below. 



• The (9, 3) trapping sets which are successors of the 
(8,4){2}, (8,4){3} and (8,4){4} trapping sets (see Fig. 
[T0| for an illustration of the relationship among these 
trapping sets). 

Example 9: Let q = 199 and let Cq be defined by the parity 
check matrix Hq = f(Ua) where U§ is shown in ( 10 1. Cq is a 



(2388, 1793) code with column weight 3, row weight 12 and 
rate R — 0.75. The Tanner graph of Cg has girth g = 8 and 
does not contain either (5,3){2} or (7,3) trapping sets and 
neither does it contain (10, 2) trapping sets that are generated 
by either (8,4){2}, (8,4){3} or (8,4){4} trapping sets. The 
FER performance of C§ under the SPA with 100 iterations 



on the BSC is shown in Fig. 16 For comparison, Fig. 16 also 



shows the FER performance of a (3150, 2518) PEG code. This 
code has girth g = 8. It can be seen that Ce has a lower error 
floor than the PEG code. 

Example 10: Let q = 337 and let C7 be defined by a parity 

C 7 is 
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check matrix H7 = f(Ui) where Uj is shown in 
a (4381,3372) code with column weight 3, row weight 13 
and rate R — 0.77. The Tanner graph of C 7 has girth g = 8 
and does contain either (5, 3){2} or (7,3) trapping sets and 
neither does it contain (9, 3) trapping sets that are generated 
by either (8,4){2}, (8,4){3} or (8,4){4} trapping sets. The 
FER performance of C7 under the SPA with 100 iterations 



on the BSC is shown in Fig. [T71 For comparison, Fig. 17 also 



shows the FER performance of a (4381, 3370) PEG code. This 
code has girth g = 8. It can be seen that C7 has a lower error 
floor than that of the PEG code. 

It is worth mentioning that the error rate performance of 
existing structured regular column-weight-three codes in the 
literature is at best comparable with the error rate performance 
of PEG constructed codes. All of our structured codes pre- 
sented in this paper outperform PEG constructed codes and 
hence they are candidates for the best known high rate short 
length regular column-weight-three LDPC codes. 
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IX. Discussion and Conclusion 

Although the codes presented in this paper are optimized 
for the BSC, they also have excellent performance on the 
AWGNC. As a demonstration, we show the FER performance 
of the code C 5 from Example [8] under the SPA on the AWGNC 
in Fig. 18 Recall that the Tanner graph of C5 has girth g = 8 
and does not contain (5, 3){2} and (8, 2) trapping sets and that 
the (3150, 2520) regular QC LDPC code was constructed using 
array masking proposed in (5). It can be seen that although C5 
was constructed for the BSC, it outperforms the other code, 
which was constructed for the AWGNC. 

We have introduced a new class of structured LDPC codes 
with a wide range of rates and lengths. More importantly, 
we have proposed a method to construct codes whose Tanner 
graphs are free of small trapping sets. These trapping sets are 
selected based on their relative harmfulness for the decoding 
algorithms. We have also presented the constructions of regular 
column-weight-three codes. These codes have excellent perfor- 
mance on both the BSC and the AWGNC, although they were 
only optimized for the BSC. To the best of our knowledge, 
these codes outperform the best known short length structured 
LDPC codes. Our future work includes extending the TSO to 
include irregular codes and column-weight-four codes as well 
as the constructions of column-weight-four codes and irregular 
LDPC codes with low error floor. 



Appendix A 

Implementation of Techniques of Searching for 
Trapping Sets 

A. Subroutines 

We assume that the following simple subroutines are used 
in our search algorithms. 

. Y = RowIntersectIndex(Xi,X 2 ,i9). 

Let Xx and X 2 be matrices. Y is a matrix of two 
columns. If (ij.,12) is a row of Y then the ii th row of 
Xi and the 12 th row of X 2 share $ common entries. 
. Y = OddDegreeChecks(#, X). 

Let H be the parity check matrix corresponding to a 
Tanner graph G of an LDPC code. Let X be a matrix with 
each row of X giving a set of variable nodes. Assume that 
all the subgraphs induced by variable nodes in rows of X 
have the same number of odd degree check nodes. Y is 
a matrix with the same number of rows as X. Elements 
of the i th row of Y are odd degree check nodes in the 
subgraph induced by the variable nodes in the i th row of 
X. 

. Y = TotalChecksOfDegreeK(ff, X, d). 

Let H be the parity check matrix corresponding to a 
Tanner graph G. Let X be a matrix whose elements are 
variable nodes in G. Y is a one-column matrix with the 
same number of rows as X. The element in the i th row 
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of Y is the number of check nodes with degree •& in the 
subgraph induced by the variable nodes in the i th row of 
X. 

. Y = IsTrappingSet(ff, X). 

Let H be the parity check matrix corresponding to a 
Tanner graph G. Let X be a matrix whose elements are 
variable nodes in G. Y is a one-column matrix with the 
same number of rows as X. The element in the i th row 
of Y is 1 if the variable nodes in the i th row of X form 
a trapping set and is otherwise. 

The above subroutines can be implemented using simple 
sparse matrix operations and hence are of low complexity. 



B. Searching for Cycles of Length § 

Since every trapping set contains at least one cycle, the 
search for trapping sets always starts with finding cycles in 
the Tanner graph. All cycles of length i3 that contain variable 
node v can be found by performing the following steps. 

1) Construct the tree of depth d/2 — 1, taking v as the 
root using the breadth-first search algorithm pO) . Let 
N X ,N 2 ,..., N dv be sets of leaf nodes of depth d/2 - 1 
such that all the nodes in Ni are descendants of the 



TABLE III 

Number of Cycles of Several LDPC Codes and Run-time of the 
Cycle Searching Algorithm on a 2.6 GHz Computer 



;lh 



neighbor of v. It can be shown that \N, t \ < (d v 



- l)* 2 where 
$ 3 

ii = - - -,t 2 = ti + 1 if is odd 

■d 

ti = t 2 = — — 1 if #/2 is even. 

2) For every pair of nodes Oi E Ni, Oj e Nj and i ^ j, 
determine if they share a common neighbor. If so then 
a cycle of length i9 has been found. If Oi and Oj are 
check nodes then the cycle is induced by the variable 
nodes that are ancestors of Oj and Oj and their common 
neighbor. If Oj and Oj are variable nodes then the cycle 
is induced by Oi and Oj as well as the variable nodes 
that are ancestors of Oi and oj. The maximum number 
of possible pairs o it Oj is d v(d v - l)\Ni\\Nj\. 

The two steps described above are executed for every 
variable node. To further simplify the search, after all the 
cycles containing v are found, v can be marked so that it 
is no longer included in Step 1 of the search at other variable 
nodes. The complexity of searching for cycles is polynomial 
in the degree of the variable nodes and check nodes but 
increases only linearly in the code length. Note that our search 
algorithm not only counts the number of cycles but also 
records the variable nodes that each cycle contains. For this 
reason, existing efficient algorithms to count number of cycles 
in a bipartite graph (for example those proposed in [51 1, [52 1) 
can not be applied directly. 

Example 11: To illustrate the search algorithm, we list the 
number of cycles in some popular codes, as well as the run- 
times of the algorithm on a 2.6 GHz computer in Table III 



Codes 


Tanner 


Margulis 


MacKay 


n 


155 


2168 


4095 


{d v ,d a ) 


(3,6) 


(3,6) 


(3,17) 


Number of 6-cycles 








5183 


Number of 8-cycles 


465 


1320 


121238 


Number of 10-cycles 


3720 


11088 


3038421 


Run-time (Seconds) 


0.007 


0.23 


28.79 



C. Searching for (a + 1,6 — 1) Trapping Sets Generated by 
(a, 6) Trapping Sets 

Let 71 be an (a, 6) trapping set, 7a be an (a + 1,6 — 1) 
trapping set and let 71 be a parent of 7a- Further, let Ti 
be a trapping set of type 71 in the Tanner graph of a code 
C and assume that Ti generates a trapping set T 2 of type 



1) 

2) 
3) 



72- As discussed in Section IV T 2 is obtained by adjoining 
one variable node to 71. The line-point representation of T 2 
is obtained by merging two black shaded nodes in the line- 
point representation of 71 with two © nodes in Fig. |3|[b)| 
Therefore, to search for T 2 , it is sufficient to search for a 
variable node that is connected to two odd degree check nodes 
in the subgraph induced by variable nodes in Ti. 

Let X be a matrix whose each row contains variable nodes 
of a 71 trapping set in the Tanner graph G. H is the parity 
check matrix which defines C. All T 2 trapping sets can be 
found by performing the following steps. 

Find all odd degree check nodes of all 71 trapping sets: 
Yi = OddDegreeChecks(iJ, X). 
Form Xi, a one-column matrix with n rows where the 
element in the i th row is variable node i. 
Form a matrix Y 2 whose i th row gives all check nodes 
neighboring to the variable node i: 
Y 2 = OddDegreeChecks(iJ, Xj). 

4) Find all pairs such that the i th row of Yi and the 
j th row of Y 2 share 2 common entries. 

Y 3 = RowIntersectIndex(Yi,Y 2 ,2). 

5) If is the / th row of Y 3 , adjoin variable node j to 
the i th row of X to form the I th row of Y 4 . 

6) Determine the number of degree one check nodes in 
the subgraph induced by variable nodes in each row of 
Y4 and eliminate the rows of Y4 that do not have 6—1 
degree one check nodes. The matrix Y obtained has each 
row contain variable nodes that induce a T 2 trapping set 
in the Tanner graph of the code. 

Y = Y 4 (TotalChecksOfDegreeK( J ff, Y 4 , 1)== 6 1). 

D. Searching for (a + 2, 6) Trapping Sets Generated by (a, 6) 
Trapping Sets 

Let 71 be an (a, 6) trapping set, T 2 be an (a + 2, b) trapping 
set and let 71 be a parent of 7i- Further, let Ti be a trapping 
set of type 71 in the Tanner graph of a code C and assume 
that Ti generates a trapping set T 2 of type T 2 - Consider two 
variable nodes that share a check node. As discussed in Section 



IV 



7~ 2 is obtained by adjoining these two variable nodes to 
71 . The line-point representation of T 2 is obtained by merging 
two black shaded nodes in the line-point representation of 71 
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with two © nodes in Fig. |3j|c)| Therefore, to search for T2, it 
is sufficient to search for a pair of variable nodes that share a 
common neighboring check node and each node is connected 
to one odd degree check node in the subgraph induced by 
variable nodes in Ti. 

The search for (a + 2, 6) trapping sets is very similar to 
the search for (a + 1,6 — 1) trapping sets described in the 
previous subsection. In particular, the following modifications 
should be made: 

• In Step 2, Xi is a two column matrix, each row contains 
a pair of variable nodes that share a common neighboring 
check node. 

• In Step 3, the i th row of Y 2 gives all degree one check 
nodes in the subgraph induced by variable nodes in the 



TABLE IV 

Number of Small Trapping Sets in MacKay Random Code of 
Length 4095 



■th 



row of Xi. 



• In Step 5, variable nodes in the j th row of Xi are 
adjoined to the i th row of X. 

• In Step 6, b — 1 is replaced by b. 

Since Step 4 does not take into account the case in which 
two check nodes of a new variable node are merged with two 
odd degree check node of Ti, the subroutine IsTrappingSet 
is used afterward to eliminate rows of Y that do not contain 
variable nodes that form a trapping set. 

E. Remarks 

The above search procedures may not differentiate among 
different (a, b) trapping sets. For example, all (5, 3){1} and 
(5, 3) {2} trapping sets are found if they are searched for as 
trapping sets generated by the (4, 4) trapping sets. Similarly, 
all (7, 3){2} and (7, 3){3} are found as trapping sets generated 
by the (6, 4){1} trapping sets. If searching for a specific type 
of trapping set is required, then it is necessary to further 
analyze the induced subgraph. For example, notice that the 
(5, 3){1} trapping set is a union of a 6-cycle and an 8-cycle, 
sharing two variable nodes while the (5, 3){1} trapping set is a 
union of two 8-cycle, sharing three variable nodes. Therefore, 
to search for all (5, 3){2} trapping sets from a list of (4,4) 
trapping sets, one would find all pairs of (4, 4) trapping sets 
that share three variable nodes, using the Rowlntersectlndex 
subroutine. The union of each pair of (4,4) trapping sets is 
then a set of five variable nodes. Each set of variable nodes 
forms a (5, 3){2} trapping set if its induced subgraph contains 
three degree one check nodes. Similarly, all (7, 3){2} can 
be found by noticing that they are unions of two (6, 4){1} 
trapping sets, sharing five variable nodes. 

Example 12: We end this section by giving the statistics of 
small trapping sets present in the random MacKay code of 
length 4095 along with the running times of the algorithms. 



These are given in Table IV Note that the numbers of six, 
eight and ten cycles present in the Tanner graph of this code 



are given in Table III All the searches were performed in a 
2.6 GHz computer. 

Appendix B 
Relations to Array LDPC Codes 

We show that the class of codes described in Section 
III contains array LDPC codes when the Galois field is a 



Trapping Sets 


Total 


Run-time (Seconds) 


(5,3){1) 


19617 


0.59 


(5,3){2| 


3259 


12.18 


(6,2){1) 


167 


0.16 


(7,1){1| 


2 


0.05 


(M){1> 


299636 


55.78 


(7,3){2| and (7,3){3| 


56309 


4.21 



prime field. The parity check matrix of an array LDPC code 
described by Fan in [9] is a 7 x p subarray of the matrix Harr 
of the form 



%arr — 



I 


I 


I 


I 


I 


J 


J 2 




I 


J 2 


J 4 


J2(9-1) 


I 


J1- 1 


j2(g-i) . 





(12) 



where q is an odd prime and J is a q x q circulant matrix. We 
now show that Han can be obtained by permuting rows and 
columns of H in d51, hence array LDPC codes are contained 
in the new class of LDPC codes proposed in this paper. 

Let q be an odd prime. Since the additive group of 
GF(q) is cyclic, we can write GF(q) = {/3-co = 0,/3q = 
1,01, ... , /V2}, where A+i = ft + 1 for < i < q - 3 
and /?_oo = /3 q -2 + 1. Permute rows and columns of C to 
obtain Cp, a Latin square that has (/3_oo, fio, (3\, . . . , fiq-2) as 
indices of rows from top to bottom and columns from left to 
right. It can be shown that there exists a permutation matrix 
O such that Cp = OCO. Replace C by Cp and let Mp be 
the sets of images of GF(q) under /. It can be seen that: 

• Aip is the set of circulant permutation matrices of size 
qxq. 

• /(/3_oo) = /, the qxq identity matrix. 

• Proposition [T] [2] and Theorem [T] still holds when a t is 
replaced with f3 t and P is replaced with Pp = OPO' . 

Finally, permute rows and columns of W in [4] to obtain 
Wp = OWO'. Then Wp has the form: 



Wp 



00 

1 Pi 

Pi 0i 

P q -2 







and hence f(Wp) = Uzn, with J = /(/3 ) = /(l). 
Appendix C 

Quasi-cyclic LDPC Codes from Cyclic Groups of 
Permutation Matrices 

In (5J, Lan et. al give the construction of a class of QC 
LDPC codes based on the multiplicative groups of Galois 
fields. We briefly describe this class of codes, but with the 
formulation introduced in this paper. 

Consider the Galois field GF(g), where q is a power of 
a prime. Let £ = [h,j]ij & Q denote a Latin square defined 
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by the Cayley table of (Q, ©) where Q = {1, at, . . . , ofl 2 } Let i\ = £ £ GF(g) and let 
and is the multiplicative operation of GF(g), i.e., kj = 
i x j. Let M = {M , M 1; . . . , M q _ 2 } be the set of images 
of elements of Q under /. We give the following statements 
without proofs: 

• Mq = I is the (q — l)x(g— 1) identity matrix. 

. /(a^a^) = /(a tl )/(a* 2 ). 

• .M is the cyclic group of permutation matrices of size 
(q — 1) x (q — 1) under ordinary matrix multiplication. 

Define W and % as in Q and where (Q, ©) is the 
multiplicative group of GF(q). The following theorem gives 
the necessary and sufficient condition on W, such that the 
Tanner graph corresponding to H has girth at least 6. We omit 
the proof since it is very similar to the proof of Theorem [T] 

Theorem 3 ( Cross-multiplication Constraint): The Tanner 
graph corresponding to H contains no cycle of length four 
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iff w iu j X Wi^j 2 ^ w ix,h w i 2 ,h for ar, y 1 - *i'*2 < m; 

1 < ji,32 < V, ii + i2\ h +32- 

For any pair (7, p) of positive integers with 1 < 7, p < q, 
let H be a 7 x p subarray of H. Then H is a 7(7 x pq matrix 
over GF(2) which is also free of cycles of length 4. H has 
constant column weight d v = 7 and row weight d c — p. The 
null space of H gives a regular structured LDPC code C of 
length pq with rate at least R — (p — j)/p |l]. 

Remarks: We can adjoint the zero element of GF(q) to the 
set Q to obtain Q' = Q n {0} and define /(0) = Z, the all 
zero matrix of size (q — 1) x (q — 1). Theorem [3] still holds 
for W defined on Q' . In this case, the cross-multiplication 
constraint is equivalent to the a-multiplied row constraints 
given in (5). It is almost obvious that Latin squares obtained 
from the Cayley table of the additive group of GF(q) satisfy 
the cross-multiplication constraint. This concept is used in | 
to obtain a class of QC LDPC codes on Latin squares. 



Appendix D 

Minimum Distance of Code Constructed from 
G¥(2 d ) 

The structured LDPC codes proposed in this paper include 
codes of length 2 , which allow hardware implementation to 
be further simplified. Unfortunately, the minimum distances of 
these codes are upper bounded by 8. 

Theorem 4: Let C be an LDPC code defined in Section Un] 
with 7 = 3. If q = 2 where 1) E N, > 1 then the minimum 
distance of C is at most 8. 

Proof: Let H be the parity check matrix of C, where C is 

2 tf . We know 



Then the following equations hold since a* 4 a* = Va* e 

GF(q) 



III 



with q 



an LDPC code defined in Section 
that H = f(W), where W is a matrix over GF(q). Let W be 
a matrix formed by any two columns of W. WLOG assume 
that 
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Recall that /(a*) is a permutation matrix whose rows and 
columns are indexed using elements of GF{q), and that its 
entries mfj — 1 if and only if i — j = a*. Since the above 
equations hold, there exist eight variable nodes in the Tanner 
graph corresponding to f(W) with line-point representation 
shown in Fig. |8j^b)| In other words, the Tanner graph of C 
contains the (8, 0){1} weight-eight codewords. ■ 

Clearly, the Tanner graph of C contains trapping sets that 
are parents of the (8,0){1} codeword, i.e., it contains the 
(7, 3){2} and (6,4){1} trapping sets shown in Fig. |^a) and 
p|[b)| and eight cycles. Consequently, the Tanner graph of C 
has girth at most 8. From the proof of Theorem [4] it can be 
seen that the number of (8,0){1} codewords of C is lower 
bounded by where p is the row weight of C. We 

also note that for all the column-weight-three codes of girth 
g = 8 that we have constructed from GF(2 1 '), this lower bound 
gives the exact number of (8, 0){1} codewords. Moreover, we 
found no (8, 0){2} codewords (the line-point representation of 
the (8, 0){2} codeword is shown in Fig. £ c)i in these codes. 
Therefore, the total number of weight-eight codewords in these 
codes is ( P 2 )2 S ~ 2 . 
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